convolution kernel
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
In the rapidly evolving field of deep learning, the demand for models that are both expressive and computationally efficient has never been more critical. This paper introduces Orchid, a novel architecture designed to address the quadratic complexity of traditional attention mechanisms without compromising the ability to capture long-range dependencies and in-context learning. At the core of this architecture lies a new data-dependent global convolution layer, which contextually adapts its kernel conditioned on input sequence using a dedicated conditioning neural network. We design two simple conditioning networks that maintain shift equivariance in our data-dependent convolution operation. The dynamic nature of the proposed convolution kernel grants Orchid high expressivity while maintaining quasilinear scalability for long sequences. We evaluate the proposed model across multiple domains, including language modeling and image classification, to highlight its performance and generality. Our experiments demonstrate that this architecture not only outperforms traditional attention-based architectures such as BERT and Vision Transformers with smaller model sizes, but also extends the feasible sequence length beyond the limitations of the dense attention layers. This achievement represents a significant step towards more efficient and scalable deep learning models for sequence modeling.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Asia > India (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Health & Medicine (0.67)
- Information Technology (0.46)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (5 more...)
Log-Polar Space Convolution Layers
Convolutional neural networks use regular quadrilateral convolution kernels to extract features. Since the number of parameters increases quadratically with the size of the convolution kernel, many popular models use small convolution kernels, resulting in small local receptive fields in lower layers. This paper proposes a novel log-polar space convolution (LPSC) layer, where the convolution kernel is elliptical and adaptively divides its local receptive field into different regions according to the relative directions and logarithmic distances. The local receptive field grows exponentially with the number of distance levels. Therefore, the proposed LPSC not only naturally encodes local spatial structures, but also greatly increases the single-layer receptive field while maintaining the number of parameters. We show that LPSC can be implemented with conventional convolution via log-polar space pooling and can be applied in any network architecture to replace conventional convolutions. Experiments on different tasks and datasets demonstrate the effectiveness of the proposed LPSC.
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
- Europe > United Kingdom > England > North Yorkshire > York (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (2 more...)